Using Transfer Learning to Assist Exploratory Corpus Annotation
نویسندگان
چکیده
We describe an under-studied problem in language resource management: that of providing automatic assistance to annotators working in exploratory settings. When no satisfactory tagset already exists, such as in under-resourced or undocumented languages, it must be developed iteratively while annotating data. This process naturally gives rise to a sequence of datasets, each annotated differently. We argue that this problem is best regarded as a transfer learning problem with multiple source tasks. Using part-of-speech tagging data with simulated exploratory tagsets, we demonstrate that even simple transfer learning techniques can significantly improve the quality of pre-annotations in an exploratory annotation.
منابع مشابه
Towards Faster Annotation Interfaces for Learning to Filter in Information Extraction and Search
This work explores the design of an annotation interface for a document filtering system based on supervised and semisupervised machine learning, focusing on usability improvements to the user interface to improve the efficiency of annotation without loss of precision, recall, and accuracy. Our objective is to create an automated pipeline for information extraction (IE) and exploratory search f...
متن کاملSolving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for focusing annotation efforts when all documents cannot be annotated, but has the limitation that it is carried out in a closed-loop, selecting points that will improve an existing model. For phenomena-driven and exploratory CC, the lack of existing-models and spe...
متن کاملTreebank Development with Deductive and Abductive Explanation-based Learning: Exploratory Experiments
In pace with the success of corpus-based approaches to theoretical and computational linguistics, the collocation of corpora has evolved into a research activity in its own. As the currently available corpora either lack annotation depth or closure, more data will be annotated in the future, preferably with minimal human intervention. This paper tries to approach the problem of treebank develop...
متن کاملEuropean Association for Computer Assisted Language Learning THE EUROCALL REVIEW
BACKBONE is a European LLP/Languages project (1) (Jan 2009 Feb 2011), whose overall objective is to provide foreign language teachers in CLIL settings with innovative language learning solutions. To achieve this goal, pedagogic corpora of spoken interviews are combined with corpus-related e-learning activities in blended learning scenarios. The seven BACKBONE corpora contain video interviews in...
متن کاملSemi-Automatic Sign Language Corpora Annotation using Lexical Representations of Signs
Nowadays many researches focus on the automatic recognition of sign language. High recognition rates are achieved using lot of training data. This data is, generally, collected by manual annotating SL video corpus. However this is time consuming and the results depend on the annotators knowledge. In this work we intend to assist the annotation in terms of glosses which consist on writing down t...
متن کامل